Spelling Error Patterns in Brazilian Portuguese
نویسندگان
چکیده
Fifty years after Damerau set up his statistics for the distribution of errors in typed texts, his findings are still used in a range of different languages. Because these statistics were derived from texts in English, the question of whether they actually apply to other languages has been raised. We address this issue through the analysis of a set of typed texts in Brazilian Portuguese, deriving statistics tailored to this language. Results show that diacritical marks play a major role, as indicated by the frequency of mistakes involving them, thereby rendering Damerau’s original findings mostly unfit for spelling correction systems, although still holding them useful, should one set aside such marks. Furthermore, a comparison between these results and those published for Spanish show no statistically significant differences between both languages—an indication that the distribution of spelling errors depends on the adopted character set rather than the language itself.
منابع مشابه
Unconventional word segmentation in Brazilian children’s early text production
An important element of learning to read and write at school is the ability to define word boundaries. Defining word boundaries in text writing is not a straightforward task even for children who have mastered graphophonemic correspondences. In children’s writing, unconventional word segmentation has been observed across a range of languages and contexts with more occurrences of hyposegmentatio...
متن کاملDermatology and the Brazilian Portuguese language orthographic reform.
The Brazilian Portuguese language orthographic reform has promoted changes in writing in less than 2% of its lexis. However, these changes have affected medical practice. The authors present in this article the main changes in the orthographic rules and gather a group of words that have had their spelling altered by this new language reform emphasizing the dermatological terms.
متن کامل‘Minor’ Languages, ‘Broken’ Translations: On Brazilian Reworkings of an Albanian Novel
This essay approaches the challenges of global translation in the 21st century from what might still be considered a somewhat uncommon example: a direct translation of Ismail Kadaré's 1978 novel Prill e thyër (Broken April) from the original Albanian into Brazilian Portuguese in 2001. Not only does it examine and compare lexical elements in the source and target texts and the usage of translato...
متن کاملAutomatic Detection of Spelling Variation in Historical Corpus: An Application to Build a Brazilian Portuguese Spelling Variants Dictionary
The Historical Dictionary of Brazilian Portuguese (HDBP), the first of its kind, is based on a corpus of Brazilian Portuguese (BP) texts from the sixteenth through the eighteenth centuries (and some texts from the beginning of the nineteenth century), being developed under the sponsorship of the Brazilian funding agency CNPq (Conselho Nacional de Desenvolvimento Científico e Tecnológico). It is...
متن کاملBuilding a Corpus-based Historical Portuguese Dictionary: Challenges and Opportunities
Historical corpora are important resources for different areas. Philology, Human Language Technology, Literary Studies, History, and Lexicography are some that benefit from them. However, compiling historical corpora is different from compiling contemporary corpora. Corpus designers have to deal with several characteristics inherent in historical texts, such as: absence of a spelling standard, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computational Linguistics
دوره 41 شماره
صفحات -
تاریخ انتشار 2015